1 Abstract

The COVID-19 pandemic has had a huge effect on people’s lives, both socially and economically, with varying severity based on different characteristics between populations. It has been observed in media reports that the COVID-19 pandemic has bigger impact on older populations, populations with higher percentages of black and hispanic people, and populations with lower income. We explore the relationship between COVID-19 death rate in US counties and their socioeconomic characteristics. Potentially relevant variables are examined both graphically and numerically. We include state as one of the variables in all of our analyses to account for state related variability, so that we can concentrate on the effect of the county level social-economical variables. We find that the data confirm the observations made in the media reports about disadvantaged populations. We use statistical machine learning methods to predict county death rate based on the income, jobs, people, and county classification information of the county, and compare the methods based on test data prediction accuracy. The methods we use are linear model with stepwise variable selection, LASSO and relaxed LASSO, Random Forest, Boosting, and deep learning networks. The results of these prediction methods can provide valuable guidance on issues such as resource allocation in the future.

2 Introduction

The outbreak of the coronavirus disease 2019 (COVID-19) has been declared a global emergency by the World Health Organization (WHO). It has had over 30 million reported cases worldwide, with more than one million deaths. There have been over seven million cases and 200,000 deaths in the US alone. In response, governments have implemented travel restrictions, business and school closures, and other social distancing policies. The pandemic has affected every aspect of human society.

The impact of COVID-19 on a population can vary based on different characteristics of the population. There are reports of different infection rates and fatality rates among different age groups, racial groups, and income groups. One of the most important and direct measurements of the impact is the death rate: the percentage of people that die of the disease out of the total population. In this project, we study the relationship between COVID-19 death rate in a county and its socioeconomic characteristics. We obtained county-level data on the income, jobs, people, and county classifications, as well as the cumulative infection and fatality numbers for each county in the US. We concentrate on the fatality data rather than the infection data because fatality is more well-defined and does not depend upon factors like whether tests are widely available in the area.

The socioeconomic county-level data includes a large number of variables. After removing redundant and/or irrelevant variables, we included 41 potential explanatory variables in our analysis. We explore the variables with statistical plots and maps, and study their relation to the county death rate. It is clear that other variables, such as the phase of the pandemic the state is going through, and the lockdown/social distancing policies of the state, can also strongly relate to the county death rate. In order to account for differences between states and concentrate on the effect of county level socioeconomic variables, it is important that we include state as an explanatory variable in our analysis, as it helps to control for state related variability such as timeline and state policy.

We explore the relationship of the explanatory variables with the county death rate graphically and numerically. We find evidence in the data that confirms the observations made in media reports that the COVID-19 death rate is higher in older populations, populations with higher percentages of black and hispanic people, and populations with lower income. We use machine learning methods to build models to predict county death rates based on county social-economical characteristics, and compare the models based on test data prediction accuracy. The methods include linear model with stepwise variable selection, LASSO and relaxed LASSO, Random Forests, boosting, and deep learning neural networks. While this analysis only provides association and prediction, not cause and effect, the results can provide valuable guidance on issues such as resource allocation in the case of potential future waves of similar diseases.

3 Data Sources

The data comes from two different sources:

  1. County-level infection and fatality data - This dataset gives daily cumulative numbers on infection and fatality for each county.
  2. County-level socioeconomic data - The following are the five relevant datasets from this site.
    i. Income - Poverty level and household income
    ii. Jobs - Employment type, rate, and change
    iii. People - Population size, density, education level, race, age, household size, and migration rates
    iv. County Classifications - Type of county (rural or urban on a rural-urban continuum scale)
    v. Variable Name Lookup - Brief explanations of variable names

A detailed list of the variables is in the appendix.

4 Data Preparation and Exploration

4.1 Data Cleaning

We load in the county level income, jobs, people, and county classification data. For consistency, we rename FIPStxt column of county.class to FIPS. The county FIPS code uniquely identifies each county in the US.

We merge all of the county level socioeconomic datasets into one dataset according to FIPS. We call the combined dataset countydata.

We look at all counties in the Continental US, taking out Hawaii, Alaska, and Puerto Rico not only because they are not part of the Continental US, but there are also many missing values. We are also removing Bedford, VA, FIPS 51515 because it had a FIPS change and the new FIPS is 51019.

We now load the county level confirmed COVID-19 infection and fatality numbers data. These are daily cumulative numbers. Counties with no confirmed COVID-19 infections were not included in the data. The following plot gives the total number of counties with confirmed cases by day.

We take the cumulative data on 8/19/2020 (the date when we downloaded the dataset), for further analysis. Since each state is going through a different phase of the pandemic, and has a different lockdown and reopening timeline and different state policies, it is important to include the state as a control variable in our analysis, so we can put counties in different states on equal footing and concentrate on the effect of county level socioeconomic variables.

There are 29 records in the COVID-19 data with unknown county; we discard them. We look at all counties in the Continental US, taking out Hawaii, Alaska, and Puerto Rico.

There are also three special records in the COVID-19 data: New York City, NY; Joplin, MO; and Kansas City, MO.

New York City has five boroughs that each count as a county in the countydata, yet the COVID-19 data gives data only on New York City in its entirety. To make the COVID-19 data and the countydata consistent in their treatment of New York City, we remove the New York City entry from the COVID-19 data and add the data for five boroughs. We manually find the number of deaths and infections for each of the five boroughs around 08/19/2020 and input them into our data.

Joplin and Kansas City also consist of multiple counties (Joplin has Jasper and Newton; Kansas City has Jackson, Clay, Platte, and Cass), but the COVID-19 data already contains data on the counties. So we just remove Joplin and Kansas City from the COVID-19 data.

We merge the countydata and the COVID-19 data by FIPS. The combined data has 3109 US continental counties and 211 variables.

Now we go through the county socioeconomic variables. We remove the variables that would clearly not be relevant. For variables that were highly correlated and close in meaning, we choose only one. For example, we remove TotalHH (the total number of households in a county) because it is 0.996 correlated with TotalPopEst2019 (the total population). We also remove MedHHInc in favor of PerCapitaInc, and HH65PlusAlonePct in favor of Age65AndOlderPct2010.

We only take the latest version of each variable; for example, for unemployment rate, there is UnempRate2010, UnempRate2011… we remove the earlier years and only take UnempRate2019.

Sometimes a group of variables always adds up to 1. So we remove one of those variables. Example: Ed1LessThanHSPct, Ed2HSDiplomaOnlyPct, Ed3SomeCollegePct, Ed4AssocDegreePct, and Ed5CollegePlusPct add up to 1, so we remove Ed1LessThanHSPct (percentage of population with education level less than high school) from our variable set.

Many of the variables from the county classification data are categorical, but the other county datasets already have continuous variables for them. For instance, county.people has different education levels, but county.class has Low_Education_2015_update, which classifies a county as low-education. So we remove Low_Education_2015_update since it is less informative. Similarly, UrbanInfluenceCode2013 provides more refined information than each of Noncore2013, Micropolitan2013, Nonmetro2013, Metro2013, Metro_Adjacent2013. So we remove the latter variables.

Since TotalPop25Plus is total number of people 25 and over, we create the more relevant variable of the percentage of the population that are 25 and over. Because TotalPop25Plus is an average over 5 years, we take the average of TotalPopEst from 2014 to 2018 as the denominator when calculating the percentage.

We fill in the missing values for infections and deaths. Missing counties in COVID-19 data simply means that the county has not yet had any confirmed cases (as of 8/19/2020).

Here is a summary of all the remaining variables. There are still a very small number of missing values; we will remove the counties with missing values before we move on to data modeling, but these counties are kept for now for the sake of the graphs of the variables that do not involve missing values.

##     County             state                fips           cases       
##  Length:3109        Length:3109        Min.   : 1001   Min.   :     0  
##  Class :character   Class :character   1st Qu.:19043   1st Qu.:    73  
##  Mode  :character   Mode  :character   Median :29211   Median :   260  
##                                        Mean   :30666   Mean   :  1757  
##                                        3rd Qu.:46007   3rd Qu.:   872  
##                                        Max.   :56045   Max.   :225827  
##                                                                        
##      deaths         Deep_Pov_All    PovertyAllAgesPct  PerCapitaInc  
##  Min.   :   0.00   Min.   : 0.000   Min.   : 2.60     Min.   :10148  
##  1st Qu.:   1.00   1st Qu.: 4.469   1st Qu.:10.90     1st Qu.:22750  
##  Median :   4.00   Median : 6.109   Median :14.20     Median :26216  
##  Mean   :  53.85   Mean   : 6.685   Mean   :15.18     Mean   :26980  
##  3rd Qu.:  20.00   3rd Qu.: 8.038   3rd Qu.:18.30     3rd Qu.:29986  
##  Max.   :5981.00   Max.   :33.183   Max.   :54.00     Max.   :72832  
##                    NA's   :1                          NA's   :1      
##  UnempRate2019      PctEmpFIRE     PctEmpConstruction  PctEmpTrans    
##  Min.   : 0.700   Min.   : 0.000   Min.   : 0.000     Min.   : 0.000  
##  1st Qu.: 3.000   1st Qu.: 3.347   1st Qu.: 5.786     1st Qu.: 4.167  
##  Median : 3.700   Median : 4.283   Median : 7.051     Median : 5.238  
##  Mean   : 3.963   Mean   : 4.569   Mean   : 7.339     Mean   : 5.512  
##  3rd Qu.: 4.600   3rd Qu.: 5.436   3rd Qu.: 8.581     3rd Qu.: 6.498  
##  Max.   :18.300   Max.   :20.603   Max.   :25.532     Max.   :22.487  
##                   NA's   :1        NA's   :1          NA's   :1       
##   PctEmpMining       PctEmpTrade     PctEmpInformation PctEmpAgriculture
##  Min.   : 0.00000   Min.   : 0.838   Min.   : 0.0000   Min.   : 0.000   
##  1st Qu.: 0.08058   1st Qu.:12.248   1st Qu.: 0.8615   1st Qu.: 1.154   
##  Median : 0.30194   Median :13.840   Median : 1.2860   Median : 2.866   
##  Mean   : 1.58556   Mean   :13.695   Mean   : 1.3678   Mean   : 5.089   
##  3rd Qu.: 1.27458   3rd Qu.:15.277   3rd Qu.: 1.7308   3rd Qu.: 6.294   
##  Max.   :44.03561   Max.   :38.889   Max.   :12.3288   Max.   :59.649   
##  NA's   :1          NA's   :1        NA's   :1         NA's   :1        
##  PctEmpManufacturing PctEmpServices     PctEmpGovt     PopDensity2010    
##  Min.   : 0.000      Min.   : 8.333   Min.   : 0.000   Min.   :    0.12  
##  1st Qu.: 6.858      1st Qu.:38.291   1st Qu.: 3.513   1st Qu.:   17.63  
##  Median :11.434      Median :42.768   Median : 4.733   Median :   45.64  
##  Mean   :12.343      Mean   :42.994   Mean   : 5.505   Mean   :  264.36  
##  3rd Qu.:16.724      3rd Qu.:47.539   3rd Qu.: 6.489   3rd Qu.:  115.04  
##  Max.   :48.024      Max.   :81.589   Max.   :33.500   Max.   :69468.42  
##  NA's   :1           NA's   :1        NA's   :1                          
##    OwnHomePct    Age65AndOlderPct2010 Over25Pct2018    Under18Pct2010 
##  Min.   :19.61   Min.   : 3.73        Min.   :0.3847   Min.   : 9.11  
##  1st Qu.:67.71   1st Qu.:13.19        1st Qu.:0.6662   1st Qu.:21.42  
##  Median :72.67   Median :15.60        Median :0.6911   Median :23.31  
##  Mean   :71.53   Mean   :15.95        Mean   :0.6887   Mean   :23.41  
##  3rd Qu.:77.06   3rd Qu.:18.25        3rd Qu.:0.7154   3rd Qu.:25.09  
##  Max.   :92.40   Max.   :43.38        Max.   :0.9600   Max.   :40.13  
##                                                                       
##  Ed2HSDiplomaOnlyPct Ed3SomeCollegePct Ed4AssocDegreePct Ed5CollegePlusPct
##  Min.   : 5.47       Min.   : 4.116    Min.   : 1.116    Min.   : 0.00    
##  1st Qu.:29.81       1st Qu.:19.197    1st Qu.: 7.148    1st Qu.:14.99    
##  Median :34.57       Median :21.664    Median : 8.663    Median :19.22    
##  Mean   :34.28       Mean   :21.780    Mean   : 8.930    Mean   :21.56    
##  3rd Qu.:39.28       3rd Qu.:24.108    3rd Qu.:10.535    3rd Qu.:25.52    
##  Max.   :55.62       Max.   :38.667    Max.   :21.397    Max.   :78.53    
##                                                                           
##  ForeignBornPct   Net_International_Migration_Rate_2010_2019
##  Min.   : 0.000   Min.   :-1.2450                           
##  1st Qu.: 1.346   1st Qu.: 0.0890                           
##  Median : 2.711   Median : 0.3820                           
##  Mean   : 4.681   Mean   : 0.8748                           
##  3rd Qu.: 5.671   3rd Qu.: 1.0310                           
##  Max.   :53.254   Max.   :20.4030                           
##                                                             
##  NetMigrationRate1019 NaturalChangeRate1019 TotalPopEst2019   
##  Min.   :-32.17900    Min.   :-11.0250      Min.   :     169  
##  1st Qu.: -4.08100    1st Qu.: -1.4230      1st Qu.:   11131  
##  Median : -1.18400    Median :  0.4920      Median :   26118  
##  Mean   :  0.01595    Mean   :  0.9355      Mean   :  105113  
##  3rd Qu.:  3.15200    3rd Qu.:  2.8640      3rd Qu.:   68238  
##  Max.   :115.58000    Max.   : 23.0850      Max.   :10039107  
##                                                               
##  WhiteNonHispanicPct2010 NativeAmericanNonHispanicPct2010
##  Min.   : 2.80           Min.   : 0.00                   
##  1st Qu.:67.29           1st Qu.: 0.19                   
##  Median :85.94           Median : 0.30                   
##  Mean   :78.62           Mean   : 1.59                   
##  3rd Qu.:94.27           3rd Qu.: 0.61                   
##  Max.   :99.16           Max.   :94.10                   
##                                                          
##  BlackNonHispanicPct2010 AsianNonHispanicPct2010 HispanicPct2010 
##  Min.   : 0.000          Min.   : 0.000          Min.   : 0.000  
##  1st Qu.: 0.410          1st Qu.: 0.270          1st Qu.: 1.590  
##  Median : 1.940          Median : 0.460          Median : 3.290  
##  Mean   : 8.842          Mean   : 1.063          Mean   : 8.329  
##  3rd Qu.:10.020          3rd Qu.: 0.970          3rd Qu.: 8.290  
##  Max.   :85.440          Max.   :33.000          Max.   :95.740  
##                                                                  
##  Type_2015_Update RuralUrbanContinuumCode2013 UrbanInfluenceCode2013
##  Min.   :0.000    Min.   :1.000               Min.   : 1.000        
##  1st Qu.:0.000    1st Qu.:2.000               1st Qu.: 2.000        
##  Median :1.000    Median :6.000               Median : 5.000        
##  Mean   :1.792    Mean   :4.987               Mean   : 5.225        
##  3rd Qu.:3.000    3rd Qu.:7.000               3rd Qu.: 8.000        
##  Max.   :5.000    Max.   :9.000               Max.   :12.000        
##  NA's   :1        NA's   :1                   NA's   :1             
##  Perpov_1980_0711 HiCreativeClass2000   HiAmenity     
##  Min.   :0.0000   Min.   :0.0000      Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000      1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000      Median :0.0000  
##  Mean   :0.1129   Mean   :0.2485      Mean   :0.2498  
##  3rd Qu.:0.0000   3rd Qu.:0.0000      3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000      Max.   :1.0000  
##  NA's   :1        NA's   :2           NA's   :3       
##  Retirement_Destination_2015_Update
##  Min.   :0.0000                    
##  1st Qu.:0.0000                    
##  Median :0.0000                    
##  Mean   :0.1416                    
##  3rd Qu.:0.0000                    
##  Max.   :1.0000                    
##  NA's   :1

4.2 Exploratory Data Analysis

4.2.1 US Maps of Relevant Variables

We map a few key variables of interest. This gives us an intuitive understanding of the geographic distribution of the county population with different social and economical characteristics, as well as the infection and death rates over the counties in the continental US. We include both state and county level maps.

In the first map above, we show the total number of deaths by state. In the second, we show the death rate (number of deaths per 100,000 people) by state. There is a clear difference between the two: In the first, New York appears to have the most deaths, but in the second, we see that New Jersey actually has proportionally more. The death rate is the more appropriate variable to look at here because takes the total population of a state into account.

Here we have an interactive map on the racial composition of state populations. The reader can see the racial composition of different US continental states on the map by hovering their mouse over the state.

Next is an interactive map on the education levels for US states.

Below is a map for the per capita income of the US states.

And now we map the infection and death rates by state.

These next two maps are more detailed. They give information on number and rate of deaths at the county level, allowing us to see variation in the death rate of counties within the same state.

The following is a map on infection rate (number of infected per 100,000 people).

The graph below is an interactive map that gives the FIPS and death rate of each county.

This is an interactive map for population density at the county level.

4.2.2 Statistical Distributions of the Variables and Variable Transformations

Here are boxplots of county death rates by state, grouped into the US’s regions of West, Northeast, Midwest, Southeast, and Southwest.

From the boxplots, we can see that in each state, the distribution of county death rate is highly skewed to the right, with many extreme outliers. This is also true for most of the explanatory variables. We proceed to transform the variables so that the distribution conforms more to the statistical model assumptions.

First we look at the county death rate, which is the proportion of COVID-19 deaths among the county population. Instead of using (number of COVID-19 deaths in county / county total population size) to measure the proportion, we use a slightly modified formula (number of COVID-19 deaths in county + 1)/(county total population size + 2). These two formula gives very similar numbers, but the latter avoids a cluster-at-zero problem for the variable, and has a simple interpretation as a Bayes estimate of proportion with a uniform prior distribution. The resulting variable distribution is still very skewed, but a log transformation makes the distribution look reasonably normal. We use the log transformed version of the variable (log_death_rate) as our response variable in our predictive models.

Next we look at the explanatory variables. We look at histograms of each variable and perform the appropriate transformations to make them less skewed. We use log transformation on five explanatory variables (PovertyAllAgesPct, PerCapitaInc, UnempRate2019, PopDensity2010, TotalPopEst2019). These are highly skewed variables and the log transformation helps make the distribution less skewed. For example, below are the histograms of PerCapitaInc and log(PerCapitaInc). We can see the log transformation makes the distribution much more symetric.

For highly skewed variables involving nonpositive numbers, log transformation can not be applied. For such variables, we instead cap the variable at the 99.5 percentile (with the exception of a couple of extremely skewed variables, which we choose to cap at the 98 percentile). We keep the rest of the variables at their original scale.

4.2.3 Pairwise Plots of Explanatory Variables

For the transformed explanatory variables, we plot pairwise scatterplots of county social-economical variables that are related.

4.2.4 Summary of Variables Included in Modelling

There are still a very small number of counties with missing values that we kept for the sake of the graphs. We remove them now before moving on to data modeling.

The final data have 3104 counties and 41 predictor variables. Two of the variables (state and Type_2015_Update) are categorical variables. The others are continuous variables. The following is a summary of all the predictor variables and the response variable. These variables are after transformation, and there is no missing values now.

##      state       Deep_Pov_All    PovertyAllAgesPct  PerCapitaInc   
##  TX     : 254   Min.   : 0.000   Min.   :0.9555    Min.   : 9.225  
##  GA     : 159   1st Qu.: 4.469   1st Qu.:2.3888    1st Qu.:10.032  
##  VA     : 133   Median : 6.104   Median :2.6462    Median :10.174  
##  KY     : 120   Mean   : 6.659   Mean   :2.6432    Mean   :10.176  
##  MO     : 115   3rd Qu.: 8.032   3rd Qu.:2.9069    3rd Qu.:10.308  
##  KS     : 105   Max.   :20.229   Max.   :3.9890    Max.   :11.196  
##  (Other):2218                                                      
##  UnempRate2019       PctEmpFIRE     PctEmpConstruction  PctEmpTrans    
##  Min.   :-0.3567   Min.   : 0.000   Min.   : 0.000     Min.   : 0.000  
##  1st Qu.: 1.0986   1st Qu.: 3.346   1st Qu.: 5.788     1st Qu.: 4.172  
##  Median : 1.3083   Median : 4.283   Median : 7.053     Median : 5.241  
##  Mean   : 1.3248   Mean   : 4.554   Mean   : 7.330     Mean   : 5.504  
##  3rd Qu.: 1.5261   3rd Qu.: 5.428   3rd Qu.: 8.584     3rd Qu.: 6.498  
##  Max.   : 2.9069   Max.   :11.794   Max.   :15.775     Max.   :13.118  
##                                                                        
##   PctEmpMining       PctEmpTrade     PctEmpInformation PctEmpAgriculture
##  Min.   : 0.00000   Min.   : 0.838   Min.   :0.0000    Min.   : 0.000   
##  1st Qu.: 0.08077   1st Qu.:12.254   1st Qu.:0.8611    1st Qu.: 1.159   
##  Median : 0.30227   Median :13.842   Median :1.2854    Median : 2.874   
##  Mean   : 1.46362   Mean   :13.681   Mean   :1.3578    Mean   : 5.063   
##  3rd Qu.: 1.27543   3rd Qu.:15.283   3rd Qu.:1.7297    3rd Qu.: 6.297   
##  Max.   :13.09628   Max.   :21.662   Max.   :4.5974    Max.   :36.493   
##                                                                         
##  PctEmpManufacturing PctEmpServices     PctEmpGovt     PopDensity2010  
##  Min.   : 0.000      Min.   : 8.333   Min.   : 0.000   Min.   :-2.120  
##  1st Qu.: 6.876      1st Qu.:38.285   1st Qu.: 3.513   1st Qu.: 2.869  
##  Median :11.451      Median :42.756   Median : 4.733   Median : 3.820  
##  Mean   :12.332      Mean   :42.955   Mean   : 5.484   Mean   : 3.809  
##  3rd Qu.:16.728      3rd Qu.:47.516   3rd Qu.: 6.486   3rd Qu.: 4.741  
##  Max.   :34.012      Max.   :63.454   Max.   :18.285   Max.   :11.149  
##                                                                        
##    OwnHomePct    Age65AndOlderPct2010 Over25Pct2018    Under18Pct2010 
##  Min.   :19.61   Min.   : 3.73        Min.   :0.3847   Min.   : 9.11  
##  1st Qu.:67.74   1st Qu.:13.21        1st Qu.:0.6661   1st Qu.:21.43  
##  Median :72.68   Median :15.61        Median :0.6912   Median :23.31  
##  Mean   :71.56   Mean   :15.95        Mean   :0.6888   Mean   :23.41  
##  3rd Qu.:77.07   3rd Qu.:18.26        3rd Qu.:0.7154   3rd Qu.:25.09  
##  Max.   :92.40   Max.   :43.38        Max.   :0.9600   Max.   :40.13  
##                                                                       
##  Ed2HSDiplomaOnlyPct Ed3SomeCollegePct Ed4AssocDegreePct Ed5CollegePlusPct
##  Min.   : 5.47       Min.   : 4.116    Min.   : 1.116    Min.   : 0.00    
##  1st Qu.:29.84       1st Qu.:19.203    1st Qu.: 7.149    1st Qu.:14.99    
##  Median :34.58       Median :21.666    Median : 8.665    Median :19.22    
##  Mean   :34.31       Mean   :21.785    Mean   : 8.934    Mean   :21.52    
##  3rd Qu.:39.28       3rd Qu.:24.109    3rd Qu.:10.541    3rd Qu.:25.50    
##  Max.   :55.62       Max.   :38.667    Max.   :21.397    Max.   :78.53    
##                                                                           
##  ForeignBornPct   Net_International_Migration_Rate_2010_2019
##  Min.   : 0.000   Min.   :-1.2450                           
##  1st Qu.: 1.345   1st Qu.: 0.0880                           
##  Median : 2.706   Median : 0.3790                           
##  Mean   : 4.643   Mean   : 0.8487                           
##  3rd Qu.: 5.664   3rd Qu.: 1.0245                           
##  Max.   :32.343   Max.   : 8.4926                           
##                                                             
##  NetMigrationRate1019 NaturalChangeRate1019 TotalPopEst2019 
##  Min.   :-16.7230     Min.   :-11.0250      Min.   : 5.130  
##  1st Qu.: -4.0812     1st Qu.: -1.4263      1st Qu.: 9.316  
##  Median : -1.1845     Median :  0.4845      Median :10.167  
##  Mean   : -0.0506     Mean   :  0.9287      Mean   :10.284  
##  3rd Qu.:  3.1402     3rd Qu.:  2.8530      3rd Qu.:11.128  
##  Max.   : 30.8751     Max.   : 23.0850      Max.   :16.122  
##                                                             
##  WhiteNonHispanicPct2010 NativeAmericanNonHispanicPct2010
##  Min.   : 2.80           Min.   : 0.000                  
##  1st Qu.:67.31           1st Qu.: 0.190                  
##  Median :85.98           Median : 0.300                  
##  Mean   :78.67           Mean   : 1.127                  
##  3rd Qu.:94.27           3rd Qu.: 0.610                  
##  Max.   :99.16           Max.   :15.971                  
##                                                          
##  BlackNonHispanicPct2010 AsianNonHispanicPct2010 HispanicPct2010 
##  Min.   : 0.0000         Min.   : 0.00           Min.   : 0.000  
##  1st Qu.: 0.4075         1st Qu.: 0.27           1st Qu.: 1.590  
##  Median : 1.9400         Median : 0.46           Median : 3.280  
##  Mean   : 8.8201         Mean   : 1.03           Mean   : 8.308  
##  3rd Qu.:10.0025         3rd Qu.: 0.96           3rd Qu.: 8.242  
##  Max.   :85.4400         Max.   :14.19           Max.   :95.740  
##                                                                  
##  Type_2015_Update RuralUrbanContinuumCode2013 UrbanInfluenceCode2013
##  0:1228           Min.   :1.00                Min.   : 1.000        
##  1: 444           1st Qu.:2.00                1st Qu.: 2.000        
##  2: 218           Median :6.00                Median : 5.000        
##  3: 496           Mean   :4.99                Mean   : 5.228        
##  4: 397           3rd Qu.:7.00                3rd Qu.: 8.000        
##  5: 321           Max.   :9.00                Max.   :12.000        
##                                                                     
##  Perpov_1980_0711 HiCreativeClass2000   HiAmenity     
##  Min.   :0.0000   Min.   :0.0000      Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000      1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000      Median :0.0000  
##  Mean   :0.1131   Mean   :0.2481      Mean   :0.2497  
##  3rd Qu.:0.0000   3rd Qu.:0.0000      3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000      Max.   :1.0000  
##                                                       
##  Retirement_Destination_2015_Update log_death_rate   
##  Min.   :0.0000                     Min.   :-11.535  
##  1st Qu.:0.0000                     1st Qu.: -9.046  
##  Median :0.0000                     Median : -8.359  
##  Mean   :0.1414                     Mean   : -8.338  
##  3rd Qu.:0.0000                     3rd Qu.: -7.611  
##  Max.   :1.0000                     Max.   : -5.142  
## 

5 Relation between log_death_rate and Age, Race, and Income

There have been reports that the COVID-19 pandemic has had more deadly impact on older population, population with higher percentage of black and hispanic people, and population with lower income level. In this section we look into the relation between log_death_rate and these variables at county level.

5.1 log_death_rate and Age

We explore the relationship between age and log_death_rate. First we plot log_death_rate against Age65AndOlderPct2010 with a fitted linear model line. It appears that there is a negative but not statistically significant (pvalue 0.912) correlation.

## 
## Call:
## lm(formula = log_death_rate ~ Age65AndOlderPct2010, data = data1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1994 -0.7093 -0.0212  0.7275  3.1953 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -8.3298623  0.0727509 -114.50   <2e-16 ***
## Age65AndOlderPct2010 -0.0004867  0.0044145   -0.11    0.912    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.017 on 3102 degrees of freedom
## Multiple R-squared:  3.918e-06,  Adjusted R-squared:  -0.0003185 
## F-statistic: 0.01215 on 1 and 3102 DF,  p-value: 0.9122

However, when we include the state as an explanatory variable in the model, and fit a linear model with two variables state and Age65AndOlderPct2010, the coefficient of Age65AndOlderPct2010 is positive and statistically highly significant (pvalue < 0.000003), which means there is a positive relationship between log_death_rate and the percentage of older people (age 65 and older) in the county population, when we account for differences in each state, including different social distance policies and different phase of the pandemic the state is going through. This highlights the importance of including the state variable in the analysis.

## Analysis of Variance Table
## 
## Response: log_death_rate
##                        Df Sum Sq Mean Sq F value    Pr(>F)    
## state                  47 1005.3 21.3900  29.888 < 2.2e-16 ***
## Age65AndOlderPct2010    1   15.8 15.7991  22.076 2.736e-06 ***
## Residuals            3055 2186.4  0.7157                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The following is a scatter plot of log_death_rate vs Age65AndOlderPct2010, with counties in different states colored differently. We also include the fitted lines for the states California, New York, and Wyoming for illustration.

5.2 log_death_rate and Race

We study the relation between race and death rate. First, we plot fitted linear lines and see that there is a positive correlation between log_death_rate and the percentage of the population that is black or hispanic.

When we account for state, and fit the linar model with variables state and percentage of black people and/or percentage of hispanic people, the relationship remains positive and very significant, which suggests that these two populations are indeed disproportionately affected by COVID-19.

## Analysis of Variance Table
## 
## Response: log_death_rate
##                           Df  Sum Sq Mean Sq F value    Pr(>F)    
## BlackNonHispanicPct2010    1  515.61  515.61 756.376 < 2.2e-16 ***
## state                     47  609.32   12.96  19.018 < 2.2e-16 ***
## Residuals               3055 2082.55    0.68                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Analysis of Variance Table
## 
## Response: log_death_rate
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## HispanicPct2010    1  118.51 118.507 169.718 < 2.2e-16 ***
## state             47  955.80  20.336  29.124 < 2.2e-16 ***
## Residuals       3055 2133.18   0.698                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Analysis of Variance Table
## 
## Response: log_death_rate
##                           Df  Sum Sq Mean Sq F value    Pr(>F)    
## BlackNonHispanicPct2010    1  515.61  515.61 785.645 < 2.2e-16 ***
## HispanicPct2010            1  178.48  178.48 271.950 < 2.2e-16 ***
## state                     47  509.08   10.83  16.504 < 2.2e-16 ***
## Residuals               3054 2004.31    0.66                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.3 log_death_rate and Income

We also explore the relationship between income and log_death_rate. The same analysis confirms that there is a negative correlation between log_death_rate and per capita income. First we plot log_death_rate against PerCapitaInc with a fitted linear model line. The fitted line has a negative slope; When we include the state variable, the coefficient of PerCapitaInc remains negative and significant. This confirms that the low income populations are impacted more by the pandemic.

## Analysis of Variance Table
## 
## Response: log_death_rate
##                Df  Sum Sq Mean Sq F value    Pr(>F)    
## state          47 1005.33 21.3900  29.886 < 2.2e-16 ***
## PerCapitaInc    1   15.63 15.6341  21.844 3.086e-06 ***
## Residuals    3055 2186.52  0.7157                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6 Predictive Modeling

Now we use machine learning methods on our data to predict our response variable log_death_rate. We first split our data into a training (n=2069, ~67%) and a testing data set (n=1035, ~33%). The training set is then split into a pure training data set (n=1448, ~70% of training, ~47% of total) and a validation data set (n=621, ~30% of training, ~20% of total). The pure training and validation data will be used to build and tune the models, and the testing data will be used to evaluate the prediction accuracy of the models.

6.1 Null Model

In the case with no explanatory variables, the predicted response based on the training data is the mean of the responses in the training set. We compute the mean squared error (MSE) of this predicted response on the testing data set, and the test MSE is 1.047.

6.2 Linear Models

From our exploration above, we see that it is important to include state in the data analysis to take into account of the differences in state. So we will first look at the linear model with state as the only explanatory variable. We see this as our baseline model. We will also look at a few other linear models with state and some particularly important variable like age, race, and income, respectively. When we fit the baseline model with only state as predictor, the model fitted on the train data has mean squared error (MSE) of 0.73 on the test data. For the model with state and age as predictor, the test MSE is only very slightly smaller. For the model with predictors state and race, the test MSE is 0.65. For the model with state and income as predictor, the test MSE is 0.72.

6.3 Linear Model with Forward Stepwise Variable Selection

Now we will begin building the linear model with forward stepwise selection. Since we have a large number of explanatory variables, we cannot search all subsets and choose to use forward stepwise selection. We force the state variable in the model. Then for the next step, we identify the variable that, in a model with state, results in the minimum residual sum of squares. We continue adding variables sequentially to the model this way. We use Cp as a criterion to find a good model. For each step, the value of Cp for the model will be outputted. From the graph of Cp vs. the number of predictors, we can see Cp is smallest when there are around 18 predictors (excluding state). So we choose to fit the model with these 18 variables (19 including state) on the training set. The Residuals vs. Fitted graph demonstrates a little heteroskedasticity, but the Q-Q plot shows that the residuals are largely normal. Then, we use the testing data set to evaluate prediction accuracy; the MSE is 0.56.

6.4 Lasso

LASSO, Least Absolute Shrinkage and Selection Operator, does model selection by minimizing least squares under the L_1 penalty. We use the package glmnet to perform the LASSO estimation. To guarantee the inclusion of state as an explanatory variable, we set the penalty for state equal to zero. We use cross validation to choose the regularization parameter that minimize the cross validation mean squared error.

We then evaluate the fitted LASSO model by calculating its MSE on the testing data set, and the test MSE is 0.56.

6.5 Relaxed LASSO

We fit a regular linear model with the same variables as selected by the LASSO model. This is also referred to as the relaxed LASSO. The testing set MSE for the relaxed LASSO is 0.56.

6.6 Random Forest

Random Forest builds a large number of uncorrelated random trees based on bootstrap samples. At each node of each tree, the split is determined by comparing a randomly selected subset of explanatory variables. To build this model, we use the randomForest package. We first tune the parameter mtry, the number of variables considered at each node. The following plot shows how the out-of-bag (OOB) MSE estimate on the training data changes with mtry.

We choose mtry=17. Then we train this model on the training data set and evaluate its prediction accuracy on the testing data set. The test MSE is 0.47.

6.7 Boosting

We use gbm package for the boosting algorithm. We run 10000 iterations in the boosting algorithm, with each iteration building a small tree to improve the fit. The following plot shows how the mean squared errors change with the number of iteration.

The red line over is the validation set MSE, and the black line at the bottom is the training set MSE. We choose the optimal number of iterations based on the validation MSE. The resulting model is applied on the testing dataset, and the test MSE is 0.495.

6.8 Deep Learning Networks

We use keras package to train deep learnings networks. We experiment with a number of network structures: a single hidden layer with 40 nodes, a single hidden layer with 80 nodes, two hidden layers with number of nodes 20/20, two hidden layers 40/20, two hidden layers 40/40, and three hidden layers 10/10/10, 20/20/20, 40/20/20, 40/40/40.

We use the plots of the MSE on validation data set to choose the network structure and the optimal epoch number. The validation MSE number is quite jumpy, so we include a smooth trend line in the plot. For illustration purpose we show the plots for the network with one hidden layer of 80 nodes and for the network with three hidden layers 10/10/10 below.

Based on validation MSE, we choose the network with three hidden layers 10/10/10 among all the network structures we considered, and we choose epoch number 500 for it. We train the model on the full training data set and test on the test data set. On the test data, the MSE is 0.537.

7 Summary

Using county level data, we studied the relation between the (log) death rate and socioeconomic characteristics of continental US counties. We explored the variables at state and county levels with statistical summary plots as well as maps. We kept the state as an explanatory variable in our analysis to take into account the impact of state related differences. We found that the data confirm several observations made in media reports: that the COVID-19 pandemic had more devastating impacts on older populations, populations with higher percentages of black and hispanic people, and populations with lower income level.

We used machine learning methods to build prediction models, and compared them based on test data mean squared error. The methods we used were linear model with forward stepwise variable selection, LASSO and relaxed LASSO, Random Forest, Boosting, and deep learning networks. The following table summaries the testing dataset mean squared error of these methods along with those of a few simpler baseline linear models.

The Testing Set Mean Squared Error
Method Test_MSE
Null Model with No Variables 1.047
Linear Model: State 0.730
Linear Model: State and Age 0.727
Linear Model: State and Race 0.650
Linear Model: State and Income 0.723
Forward Stepwise Selection 0.563
LASSO 0.561
Relaxed LASSO 0.562
Random Forest 0.470
Boosting 0.495
Deep Learning 0.537

The Random Forest and Boosting did the best in terms of accuracy. While the analysis only provides association and prediction, not cause and effect, the results can provide valuable guidance on issues such as resource allocation in the case of potential future waves of similar diseases.

8 Appendix

1. County-level infection and fatality data - https://www.kaggle.com/fireballbyedimyrnmom/us-counties-covid-19-dataset/data#  
    This dataset gives daily cumulative numbers on infection and fatality for each county.   
              1         date                Date
              2         county              County name
              3         state               State name
              4         fips                County code that uniquely identifies a county
              5         cases               Number of cumulative infections
              6         deaths              Number of cumulative deaths
2. County-level socioeconomic data - https://www.ers.usda.gov/data-products/atlas-of-rural-and-small-town-america/download-the-data/  
    The following are the five relevant datasets from this site.   
        i. Income - Poverty level and household income. The variables are:  
              
              1       PovertyUnder18Pct      Poverty rate for children age 0-17, 2018  
              2       Deep_Pov_All           Deep poverty, 2014-18  
              3       Deep_Pov_Children      Deep poverty for children, 2014-18  
              4       PovertyAllAgesPct      Poverty rate, 2018  
              5       MedHHInc               Median household income, 2018 (In 2018 dollars)  
              6       PerCapitaInc           Per capita income in the past 12 months   
                                             (In 2018 inflation adjusted dollars), 2014-18  
              7       PovertyAllAgesNum      Number of people of all ages in poverty, 2018  
              8       PovertyUnder18Num      Number of people age 0-17 in poverty, 2018    
        ii. Jobs - Employment type, rate, and change  
        
              19        UnempRate2007        Unemployment rate, 2007  
              28        UnempRate2008        Unemployment rate, 2008  
              35        UnempRate2009        Unemployment rate, 2009  
              13        UnempRate2010        Unemployment rate, 2010  
              62        UnempRate2011        Unemployment rate, 2011  
              63        UnempRate2012        Unemployment rate, 2012  
              65        UnempRate2013        Unemployment rate, 2013  
              14        UnempRate2014        Unemployment rate, 2014  
              22        UnempRate2015        Unemployment rate, 2015  
              2         UnempRate2016        Unemployment rate, 2016  
              5         UnempRate2017        Unemployment rate, 2017  
              4         UnempRate2018        Unemployment rate, 2018  
              3         UnempRate2019        Unemployment rate, 2019  
              
              37      NumEmployed2007        Employed, 2007  
              26      NumEmployed2008        Employed, 2008  
              31      NumEmployed2009        Employed, 2009  
              32      NumEmployed2010        Employed, 2010  
              67      NumEmployed2011        Employed, 2011    
              64      NumEmployed2012        Employed, 2012  
              48      NumEmployed2013        Employed, 2013  
              43      NumEmployed2014        Employed, 2014  
              59      NumEmployed2015        Employed, 2015  
              49      NumEmployed2016        Employed, 2016  
              52      NumEmployed2017        Employed, 2017  
              55      NumEmployed2018        Employed, 2018  
              60      NumEmployed2019        Employed, 2019  

              38    NumUnemployed2007        Unemployed, 2007  
              27    NumUnemployed2008        Unemployed, 2008  
              34    NumUnemployed2009        Unemployed, 2009  
              29    NumUnemployed2010        Unemployed, 2010  
              25    NumUnemployed2011        Unemployed, 2011  
              66    NumUnemployed2012        Unemployed, 2012  
              40    NumUnemployed2013        Unemployed, 2013  
              44    NumUnemployed2014        Unemployed, 2014  
              45    NumUnemployed2015        Unemployed, 2015  
              50    NumUnemployed2016        Unemployed, 2016  
              53    NumUnemployed2017        Unemployed, 2017  
              56    NumUnemployed2018        Unemployed, 2018  
              58    NumUnemployed2019        Unemployed, 2019  

              1      PctEmpChange1019        Percent employment change, 2010-19  
              11     PctEmpChange1819        Percent employment change, 2018-19  
              20     PctEmpChange0719        Percent employment change, 2007-19  
              21     PctEmpChange0710        Percent employment change, 2007-10  

              23       NumCivEmployed        Civilian employed population 16 years and over,   
                                             2014-18  
                                             
              46 NumCivLaborforce2007        Civilian labor force, 2007  
              24 NumCivLaborForce2008        Civilian labor force, 2008  
              30 NumCivLaborForce2009        Civilian labor force, 2009  
              33 NumCivLaborForce2010        Civilian labor force, 2010  
              41 NumCivLaborForce2011        Civilian labor force, 2011  
              61 NumCivLaborForce2012        Civilian labor force, 2012  
              39 NumCivLaborforce2013        Civilian labor force, 2013  
              42 NumCivLaborforce2014        Civilian labor force, 2014  
              47 NumCivLaborforce2015        Civilian labor force, 2015  
              36 NumCivLaborforce2016        Civilian labor force, 2016  
              51 NumCivLaborforce2017        Civilian labor force, 2017  
              54 NumCivLaborforce2018        Civilian labor force, 2018  
              57 NumCivLaborforce2019        Civilian labor force, 2019  

              6            PctEmpFIRE        Percent of the civilian labor force 16 and over    
                                             employed in finance and insurance, and real estate   
                                             and rental and leasing, 2014-18    
              7    PctEmpConstruction        Percent of the civilian labor force 16 and over   
                                             employed in construction, 2014-18  
              8           PctEmpTrans        Percent of the civilian labor force 16 and over     
                                             employed in transportation, warehousing and   
                                             utilities, 2014-18   
              9          PctEmpMining        Percent of the civilian labor force 16 and over    
                                             employed in mining, quarrying, oil and gas extraction, 
                                             2014-18  
              10          PctEmpTrade        Percent of the civilian labor force 16 and over    
                                             employed in wholesale and retail trade, 2014-18  
              12    PctEmpInformation        Percent of the civilian labor force 16 and over    
                                             employed in information services, 2014-18   
              15    PctEmpAgriculture        Percent of the civilian labor force 16 and over    
                                             employed in agriculture, forestry, fishing, and   
                                             hunting, 2014-18   
              16  PctEmpManufacturing        Percent of the civilian labor force 16 and over   
                                             employed in manufacturing, 2014-18    
              17       PctEmpServices        Percent of the civilian labor force 16 and over   
                                             employed in services, 2014-18    
              18           PctEmpGovt        Percent of the civilian labor force 16 and over        
                                             employed in public administration, 2014-18  

        iii. People - Population size, density, education level, race, age, household size, and migration rates  

              30                             PopDensity2010           Population density, 2010  
              54                        LandAreaSQMiles2010           Land area in square miles,   
                                                                      2010 
              50                                    TotalHH           Total number of households,   
                                                                      2014-18  
              47                                 TotalOccHU           Total number of occupied   
                                                                      housing units, 2014-18  
              21                                  AvgHHSize           Average household size,   
                                                                      2014-18   
              59                                 OwnHomeNum           Number of owner occupied   
                                                                      housing units, 2014-18  
              10                                 OwnHomePct           Percent of owner occupied   
                                                                      housing units, 2014-18  

              7                             NonEnglishHHPct           Percent of non-English   
                                                                      speaking households of   
                                                                      total households, 2014-18     
              9                            HH65PlusAlonePct           Percent of persons 65 or   
                                                                      older living alone, 2014-18  
              15                                FemaleHHPct           Percent of female headed  
                                                                      family households of total   
                                                                      households, 2014-18  
              51                                FemaleHHNum           Number of female headed   
                                                                      family households, 2014-18  
              55                            NonEnglishHHNum           Number of non-English   
                                                                      speaking households, 2014-18  
              56                           HH65PlusAloneNum           Number of persons 65 years  
                                                                      or older living alone, 2014-18

              2                        Age65AndOlderPct2010           Percent of population 65 or   
                                                                      older, 2010   
              76                       Age65AndOlderNum2010           Population 65 years or older,
                                                                      2010  
              62                             TotalPop25Plus           Total population 25 and older,
                                                                      2014-18 - 5-year average  
              5                              Under18Pct2010           Percent of population under  
                                                                      age 18, 2010  
              71                             Under18Num2010           Population under age 18, 2010

              24                           Ed1LessThanHSPct           Percent of persons with no  
                                                                      high school diploma or GED,  
                                                                      adults 25 and over, 2014-18  
              12                        Ed2HSDiplomaOnlyPct           Percent of persons with a   
                                                                      high school diploma or GED   
                                                                      only, adults 25 and over,   
                                                                      2014-18    
              1                           Ed3SomeCollegePct           Percent of persons with some  
                                                                      college experience, adults 25
                                                                      and over, 2014-18    
              8                           Ed4AssocDegreePct           Percent of persons with   
                                                                      an associate's degree,   
                                                                      adults 25 and over,   
                                                                      2014-18    
              14                          Ed5CollegePlusPct           Percent of persons with a   
                                                                      4-year college degree or   
                                                                      more, adults 25 and over,   
                                                                      2014-18    
                                                                      
              77                           Ed1LessThanHSNum           No high school, adults 25 and
                                                                      over, 2014-18   
              67                        Ed2HSDiplomaOnlyNum           High school only, adults 25   
                                                                      and over, 2014-18  
              66                          Ed3SomeCollegeNum           Some college experience,   
                                                                      adults 25 and over, 2014-18   
              69                          Ed4AssocDegreeNum           Number of persons with an  
                                                                      associate's degree, adults 25
                                                                      and over, 2014-18   
              87                          Ed5CollegePlusNum           College degree 4-years or    
                                                                      more, adults 25 and over,   
                                                                      2014-18   

              16                             ForeignBornPct           Percent of total population   
                                                                      foreign born, 2014-18  
              17                       ForeignBornEuropePct           Percent of persons born in   
                                                                      Europe, 2014-18  
              19                          ForeignBornMexPct           Percent of persons born in   
                                                                      Mexico, 2014-18   
              46               ForeignBornCentralSouthAmPct           Percent of persons born in   
                                                                      Central or South America,   
                                                                      2014-18  
              79                         ForeignBornAsiaPct           Percent of persons born in   
                                                                      Asia, 2014-18   
              80                        ForeignBornCaribPct           Percent of persons born in  
                                                                      the Caribbean, 2014-18  
              82                       ForeignBornAfricaPct           Percent of persons born in  
                                                                      Africa, 2014-18   
                                                                      
              70                             ForeignBornNum           Number of people foreign born,
                                                                      2014-18  
              52               ForeignBornCentralSouthAmNum           Number of persons born in   
                                                                      Central or South America,   
                                                                      2014-18   
              63                       ForeignBornEuropeNum           Number of persons born in   
                                                                      Europe, 2014-18  
              68                          ForeignBornMexNum           Number of persons born in   
                                                                      Mexico, 2014-18   
              81                       ForeignBornAfricaNum           Number of persons born in   
                                                                      Africa, 2014-18  
              85                         ForeignBornAsiaNum           Number of persons born in    
                                                                      Asia, 2014-18   
              86                        ForeignBornCaribNum           Number of persons born in the
                                                                      Caribbean, 2014-18   

              13 Net_International_Migration_Rate_2010_2019           Net international migration  
                                                                      rate, 2010-19  
              83      Net_International_Migration_2010_2019           Net international migration,  
                                                                      2010-19   
              84      Net_International_Migration_2000_2010           Net international migration,  
                                                                      2000-10  
              22                 Immigration_Rate_2000_2010           Net international migration   
                                                                      rate, 2000-10   
              20                       NetMigrationRate0010           Net migration rate, 2000-10   
              36                       NetMigrationRate1019           Net migration rate, 2010-19  
              49                        NetMigrationNum0010           Net migration, 2000-10  
              78                           NetMigration1019           Net Migration, 2010-19   

              18                      NaturalChangeRate1019           Natural population change   
                                                                      rate, 2010-19   
              40                      NaturalChangeRate0010           Natural population change   
                                                                      rate, 2000-10    
              48                       NaturalChangeNum0010           Natural change, 2000-10  
              65                          NaturalChange1019           Natural population change,   
                                                                      2010-19   

              45                               TotalPop2010           Population size 4/1/2010   
                                                                      Census 
              53                            TotalPopEst2010           Population size 7/1/2010  
              61                            TotalPopEst2011           Population size 7/1/2011  
              58                            TotalPopEst2012           Population size 7/1/2012  
              57                            TotalPopEst2013           Population size 7/1/2013  
              73                            TotalPopEst2014           Population size 7/1/2014  
              75                            TotalPopEst2015           Population size 7/1/2015  
              74                            TotalPopEst2016           Population size 7/1/2016  
              72                            TotalPopEst2017           Population size 7/1/2017  
              88                            TotalPopEst2018           Population size 7/1/2018     
              29                            TotalPopEst2019           Population size 7/1/2019  
              64                                TotalPopACS           Total population, 2014-18  
                                                                      - 5-year average   
              60                        TotalPopEstBase2010           County Population estimate   
                                                                      base 4/1/2010  

              3           NonHispanicAsianPopChangeRate0010           Population change rate   
                                                                      Non-Hispanic Asian, 2000-10  
              4                           PopChangeRate1819           Population change rate,   
                                                                      2018-19    
              6                           PopChangeRate1019           Population change rate,   
                                                                      2010-19    
              26                          PopChangeRate0010           Population change rate,   
                                                                      2000-10   
              27 NonHispanicNativeAmericanPopChangeRate0010           Population change rate   
                                                                      Non-Hispanic Native    
                                                                      American, 2000-10    
              28                  HispanicPopChangeRate0010           Population change rate   
                                                                      Hispanic, 2000-10  
              32              MultipleRacePopChangeRate0010           Population change rate   
                                                                      multiple race, 2000-10    
              43          NonHispanicWhitePopChangeRate0010           Population change rate   
                                                                      Non-Hispanic White, 2000-10  
              44          NonHispanicBlackPopChangeRate0010           Population change rate  
                                                                      Non-Hispanic African American,
                                                                      2000-10  

              38                        MultipleRacePct2010           Percent multiple race, 2010  
              23                    WhiteNonHispanicPct2010           Percent Non-Hispanic White,   
                                                                      2010    
              25           NativeAmericanNonHispanicPct2010           Percent Non-Hispanic Native   
                                                                      American, 2010  
              31                    BlackNonHispanicPct2010           Percent Non-Hispanic African  
                                                                      American, 2010    
              35                    AsianNonHispanicPct2010           Percent Non-Hispanic Asian,   
                                                                      2010   
              37                            HispanicPct2010           Percent Hispanic, 2010  

              11                        MultipleRaceNum2010           Population size multiple   
                                                                      race, 2010   
              33                    WhiteNonHispanicNum2010           Population size   
                                                                      Non-Hispanic White, 2010    
              34                    BlackNonHispanicNum2010           Population size Non-Hispanic  
                                                                      African American, 2010  
              39           NativeAmericanNonHispanicNum2010           Population size Non-Hispanic  
                                                                      Native American, 2010   
              42                    AsianNonHispanicNum2010           Population size   
                                                                      Non-Hispanic Asian, 2010    
              41                            HispanicNum2010           Population size Hispanic,   
                                                                      2010    

        iv. County Classifications - Type of county (rural or urban on a rural-urban continuum scale)   

              1             Type_2015_Recreation_NO           Recreation counties, 2015 edition  
              3                Type_2015_Farming_NO           Farming-dependent counties, 2015   
                                                              edition  
              4                 Type_2015_Mining_NO           Mining-dependent counties, 2015   
                                                              edition  
              5             Type_2015_Government_NO           Federal/State government-dependent   
                                                              counties, 2015 edition  
              11                   Type_2015_Update           County typology economic types,   
                                                              2015 edition   
              14         Type_2015_Manufacturing_NO           Manufacturing-dependent counties,   
                                                              2015 edition  
              42        Type_2015_Nonspecialized_NO           Nonspecialized counties,   
                                                              2015 edition    
              33            RecreationDependent2000           Nonmetro recreation-dependent,   
                                                              1997-00  
              34         ManufacturingDependent2000           Manufacturing-dependent,   
                                                              1998-00  
              35                  FarmDependent2003           Farm-dependent, 1998-00  
              36             EconomicDependence2000           Economic dependence, 1998-00  

              2         RuralUrbanContinuumCode2003           Rural-urban continuum code, 2003  
              6              UrbanInfluenceCode2003           Urban influence code, 2003  
              16        RuralUrbanContinuumCode2013           Rural-urban continuum code, 2013  
              17             UrbanInfluenceCode2013           Urban influence code, 2013  

              41                        Noncore2013           Nonmetro noncore, outside   
                                                              Micropolitan and Metropolitan,   
                                                              2013  
              22                   Micropolitan2013           Micropolitan, 2013  
              23                       Nonmetro2013           Nonmetro, 2013  
              24                          Metro2013           Metro, 2013  
              30                 Metro_Adjacent2013           Nonmetro, adjacent to metro   
                                                              area, 2013  
              31                        Noncore2003           Nonmetro noncore, outside   
                                                              Micropolitan and Metropolitan,   
                                                              2003  
              38                   Micropolitan2003           Micropolitan, 2003  
              39                          Metro2003           Metro, 2003  
              40                       Nonmetro2003           Nonmetro, 2003  
              26                 NonmetroNotAdj2003           Nonmetro, nonadjacent to metro   
                                                              area, 2003  
              37                    NonmetroAdj2003           Nonmetro, adjacent to metro  
                                                              area, 2003  

              15                     Oil_Gas_Change           Change in the value of onshore oil   
                                                              and natural gas production, 2000-11  
              20                         Gas_Change           Change in the value of onshore   
                                                              natural gas production, 2000-11    
              25                         Oil_Change           Change in the value of onshore   
                                                              oil production, 2000-11  

              18                              Hipov           High poverty counties, 2014-18  
              19                   Perpov_1980_0711           Persistent poverty counties,   
                                                              2015 edition  
              21   PersistentChildPoverty_1980_2011           Persistent child poverty   
                                                              counties, 2015 edition  
              27         PersistentChildPoverty2004           Persistent child poverty   
                                                              counties, 2004  
              29              PersistentPoverty2000           Persistent poverty counties,  
                                                              2004
                                                              
              7           Low_Education_2015_update           Low education counties, 2015 edition  
              28                   LowEducation2000           Low education, 2000  

              12                HiCreativeClass2000           Creative class, 2000  
              13                          HiAmenity           High natural amenities  
              32          RetirementDestination2000           Retirement destination,  
                                                              1990-00  

              8          Low_Employment_2015_update           Low employment counties, 2015 edition
              9         Population_loss_2015_update           Population loss counties, 2015 edition
              10 Retirement_Destination_2015_Update           Retirement destination counties, 2015
                                                              edition  

        v. Variable Name Lookup - Brief explanations of variable names